Serveur d'exploration Cyberinfrastructure

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Localizing triplet periodicity in DNA and cDNA sequences

Identifieur interne : 000886 ( Main/Exploration ); précédent : 000885; suivant : 000887

Localizing triplet periodicity in DNA and cDNA sequences

Auteurs : Liya Wang [États-Unis] ; Lincoln D. Stein [États-Unis, Canada]

Source :

RBID : PMC:2992068

Abstract

Background

The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism C. elegans.

Results

Using both simulated TP signals and the real C. elegans sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.

Conclusions

MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.


Url:
DOI: 10.1186/1471-2105-11-550
PubMed: 21059240
PubMed Central: 2992068


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Localizing triplet periodicity in DNA and cDNA sequences</title>
<author>
<name sortKey="Wang, Liya" sort="Wang, Liya" uniqKey="Wang L" first="Liya" last="Wang">Liya Wang</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724</wicri:regionArea>
<wicri:noRegion>11724</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Stein, Lincoln D" sort="Stein, Lincoln D" uniqKey="Stein L" first="Lincoln D" last="Stein">Lincoln D. Stein</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724</wicri:regionArea>
<wicri:noRegion>11724</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Ontario Institute for Cancer Research, 101 College St., Suite 800, Toronto, ON, M5G0A3, Canada</nlm:aff>
<country xml:lang="fr">Canada</country>
<wicri:regionArea>Ontario Institute for Cancer Research, 101 College St., Suite 800, Toronto, ON, M5G0A3</wicri:regionArea>
<wicri:noRegion>M5G0A3</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">21059240</idno>
<idno type="pmc">2992068</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2992068</idno>
<idno type="RBID">PMC:2992068</idno>
<idno type="doi">10.1186/1471-2105-11-550</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000481</idno>
<idno type="wicri:Area/Pmc/Curation">000481</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000571</idno>
<idno type="wicri:Area/Ncbi/Merge">000174</idno>
<idno type="wicri:Area/Ncbi/Curation">000174</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000174</idno>
<idno type="wicri:Area/Main/Merge">000889</idno>
<idno type="wicri:Area/Main/Curation">000886</idno>
<idno type="wicri:Area/Main/Exploration">000886</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Localizing triplet periodicity in DNA and cDNA sequences</title>
<author>
<name sortKey="Wang, Liya" sort="Wang, Liya" uniqKey="Wang L" first="Liya" last="Wang">Liya Wang</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724</wicri:regionArea>
<wicri:noRegion>11724</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Stein, Lincoln D" sort="Stein, Lincoln D" uniqKey="Stein L" first="Lincoln D" last="Stein">Lincoln D. Stein</name>
<affiliation wicri:level="1">
<nlm:aff id="I1">Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Cold Spring Harbor Laboratory, Williams #5, Cold Spring Harbor, NY, 11724</wicri:regionArea>
<wicri:noRegion>11724</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<nlm:aff id="I2">Ontario Institute for Cancer Research, 101 College St., Suite 800, Toronto, ON, M5G0A3, Canada</nlm:aff>
<country xml:lang="fr">Canada</country>
<wicri:regionArea>Ontario Institute for Cancer Research, 101 College St., Suite 800, Toronto, ON, M5G0A3</wicri:regionArea>
<wicri:noRegion>M5G0A3</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">BMC Bioinformatics</title>
<idno type="eISSN">1471-2105</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<sec>
<title>Background</title>
<p>The protein-coding regions (coding exons) of a DNA sequence exhibit a triplet periodicity (TP) due to fact that coding exons contain a series of three nucleotide codons that encode specific amino acid residues. Such periodicity is usually not observed in introns and intergenic regions. If a DNA sequence is divided into small segments and a Fourier Transform is applied on each segment, a strong peak at frequency 1/3 is typically observed in the Fourier spectrum of coding segments, but not in non-coding regions. This property has been used in identifying the locations of protein-coding genes in unannotated sequence. The method is fast and requires no training. However, the need to compute the Fourier Transform across a segment (window) of arbitrary size affects the accuracy with which one can localize TP boundaries. Here, we report a technique that provides higher-resolution identification of these boundaries, and use the technique to explore the biological correlates of TP regions in the genome of the model organism
<italic>C. elegans</italic>
.</p>
</sec>
<sec>
<title>Results</title>
<p>Using both simulated TP signals and the real
<italic>C. elegans </italic>
sequence F56F11 as an example, we demonstrate that, (1) Modified Wavelet Transform (MWT) can better define the boundary of TP region than the conventional Short Time Fourier Transform (STFT); (2) The scale parameter (a) of MWT determines the precision of TP boundary localization: bigger values of a give sharper TP boundaries but result in a lower signal to noise ratio; (3) RNA splicing sites have weaker TP signals than coding region; (4) TP signals in coding region can be destroyed or recovered by frame-shift mutations; (5) 6 bp periodicities in introns and intergenic region can generate false positive signals and it can be removed with 6 bp MWT.</p>
</sec>
<sec>
<title>Conclusions</title>
<p>MWT can provide more precise TP boundaries than STFT and the boundaries can be further refined by bigger scale MWT. Subtraction of 6 bp periodicity signals reduces the number of false positives. Experimentally-introduced frame-shift mutations help recover TP signal that have been lost by possible ancient frame-shifts. More importantly, TP signal has the potential to be used to detect the splice junctions in fully spliced mRNA sequence.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Tsonis, Aa" uniqKey="Tsonis A">AA Tsonis</name>
</author>
<author>
<name sortKey="Elsner, Jb" uniqKey="Elsner J">JB Elsner</name>
</author>
<author>
<name sortKey="Tsonis, Pa" uniqKey="Tsonis P">PA Tsonis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Anastassiou, D" uniqKey="Anastassiou D">D Anastassiou</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tiwari, S" uniqKey="Tiwari S">S Tiwari</name>
</author>
<author>
<name sortKey="Ramachandran, S" uniqKey="Ramachandran S">S Ramachandran</name>
</author>
<author>
<name sortKey="Bhattacharya, A" uniqKey="Bhattacharya A">A Bhattacharya</name>
</author>
<author>
<name sortKey="Bhattacharya, S" uniqKey="Bhattacharya S">S Bhattacharya</name>
</author>
<author>
<name sortKey="Ramaswamy, R" uniqKey="Ramaswamy R">R Ramaswamy</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Yan, M" uniqKey="Yan M">M Yan</name>
</author>
<author>
<name sortKey="Lin, Zs" uniqKey="Lin Z">ZS Lin</name>
</author>
<author>
<name sortKey="Zhang, Ct" uniqKey="Zhang C">CT Zhang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mena Chalco, Jp" uniqKey="Mena Chalco J">JP Mena-Chalco</name>
</author>
<author>
<name sortKey="Carrer, H" uniqKey="Carrer H">H Carrer</name>
</author>
<author>
<name sortKey="Zana, Y" uniqKey="Zana Y">Y Zana</name>
</author>
<author>
<name sortKey="Cesar, Rm" uniqKey="Cesar R">RM Cesar</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="George, Tp" uniqKey="George T">TP George</name>
</author>
<author>
<name sortKey="Thomas, T" uniqKey="Thomas T">T Thomas</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Stanke, M" uniqKey="Stanke M">M Stanke</name>
</author>
<author>
<name sortKey="Waack, S" uniqKey="Waack S">S Waack</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Liew, Awc" uniqKey="Liew A">AWC Liew</name>
</author>
<author>
<name sortKey="Yan, H" uniqKey="Yan H">H Yan</name>
</author>
<author>
<name sortKey="Yang, Ms" uniqKey="Yang M">MS Yang</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Daubechies, I" uniqKey="Daubechies I">I Daubechies</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Tuqan, J" uniqKey="Tuqan J">J Tuqan</name>
</author>
<author>
<name sortKey="Rushdi, A" uniqKey="Rushdi A">A Rushdi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chan, Yt" uniqKey="Chan Y">YT Chan</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Black, Dl" uniqKey="Black D">DL Black</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Fairbrother, Wg" uniqKey="Fairbrother W">WG Fairbrother</name>
</author>
<author>
<name sortKey="Yeh, Rf" uniqKey="Yeh R">RF Yeh</name>
</author>
<author>
<name sortKey="Sharp, Pa" uniqKey="Sharp P">PA Sharp</name>
</author>
<author>
<name sortKey="Burge, Cb" uniqKey="Burge C">CB Burge</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Lewis, R" uniqKey="Lewis R">R Lewis</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Okamura, K" uniqKey="Okamura K">K Okamura</name>
</author>
<author>
<name sortKey="Feuk, L" uniqKey="Feuk L">L Feuk</name>
</author>
<author>
<name sortKey="Marques Bonet, T" uniqKey="Marques Bonet T">T Marques-Bonet</name>
</author>
<author>
<name sortKey="Navarro, A" uniqKey="Navarro A">A Navarro</name>
</author>
<author>
<name sortKey="Scherer, Sw" uniqKey="Scherer S">SW Scherer</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Kent, Wj" uniqKey="Kent W">WJ Kent</name>
</author>
<author>
<name sortKey="Sugnet, Cw" uniqKey="Sugnet C">CW Sugnet</name>
</author>
<author>
<name sortKey="Furey, Ts" uniqKey="Furey T">TS Furey</name>
</author>
<author>
<name sortKey="Roskin, Km" uniqKey="Roskin K">KM Roskin</name>
</author>
<author>
<name sortKey="Pringle, Th" uniqKey="Pringle T">TH Pringle</name>
</author>
<author>
<name sortKey="Zahler, Am" uniqKey="Zahler A">AM Zahler</name>
</author>
<author>
<name sortKey="Haussler, D" uniqKey="Haussler D">D Haussler</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Epps, J" uniqKey="Epps J">J Epps</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gutierrez, G" uniqKey="Gutierrez G">G Gutierrez</name>
</author>
<author>
<name sortKey="Oliver, Jl" uniqKey="Oliver J">JL Oliver</name>
</author>
<author>
<name sortKey="Marin, A" uniqKey="Marin A">A Marin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Sanchez, J" uniqKey="Sanchez J">J Sanchez</name>
</author>
<author>
<name sortKey="Lopez Villasenor, I" uniqKey="Lopez Villasenor I">I Lopez-Villasenor</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pickrell, Jk" uniqKey="Pickrell J">JK Pickrell</name>
</author>
<author>
<name sortKey="Marioni, Jc" uniqKey="Marioni J">JC Marioni</name>
</author>
<author>
<name sortKey="Pai, Aa" uniqKey="Pai A">AA Pai</name>
</author>
<author>
<name sortKey="Degner, Jf" uniqKey="Degner J">JF Degner</name>
</author>
<author>
<name sortKey="Engelhardt, Be" uniqKey="Engelhardt B">BE Engelhardt</name>
</author>
<author>
<name sortKey="Nkadori, E" uniqKey="Nkadori E">E Nkadori</name>
</author>
<author>
<name sortKey="Veyrieras, Jb" uniqKey="Veyrieras J">JB Veyrieras</name>
</author>
<author>
<name sortKey="Stephens, M" uniqKey="Stephens M">M Stephens</name>
</author>
<author>
<name sortKey="Gilad, Y" uniqKey="Gilad Y">Y Gilad</name>
</author>
<author>
<name sortKey="Pritchard, Jk" uniqKey="Pritchard J">JK Pritchard</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>Canada</li>
<li>États-Unis</li>
</country>
</list>
<tree>
<country name="États-Unis">
<noRegion>
<name sortKey="Wang, Liya" sort="Wang, Liya" uniqKey="Wang L" first="Liya" last="Wang">Liya Wang</name>
</noRegion>
<name sortKey="Stein, Lincoln D" sort="Stein, Lincoln D" uniqKey="Stein L" first="Lincoln D" last="Stein">Lincoln D. Stein</name>
</country>
<country name="Canada">
<noRegion>
<name sortKey="Stein, Lincoln D" sort="Stein, Lincoln D" uniqKey="Stein L" first="Lincoln D" last="Stein">Lincoln D. Stein</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/CyberinfraV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000886 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000886 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    CyberinfraV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:2992068
   |texte=   Localizing triplet periodicity in DNA and cDNA sequences
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:21059240" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a CyberinfraV1 

Wicri

This area was generated with Dilib version V0.6.25.
Data generation: Thu Oct 27 09:30:58 2016. Site generation: Sun Mar 10 23:08:40 2024